-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
seccomp: switch default to ENOSYS #573
seccomp: switch default to ENOSYS #573
Conversation
5f489b3
to
40d6e52
Compare
Tests are not happy. |
@@ -1,5 +1,6 @@ | |||
{ | |||
"defaultAction": "SCMP_ACT_ERRNO", | |||
"defaultErrnoRet": 38, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a magic number?
pkg/seccomp/seccomp.json
Outdated
"vserver" | ||
], | ||
"action": "SCMP_ACT_ERRNO", | ||
"errnoRet": 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a magic number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is already too late for errnoRet as we already support it. Perhaps add a errnoRetString
that can be used in place of errnoRet
?
We could make defaultErrnoRet
a string, but I don't see a way to convert from a name to an errno list in the std library, so we would need to do the conversion ourselves.
40d6e52
to
dd4bab0
Compare
tests are happy now |
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you open a test PR against Podman?
We've regressed a number of the times recently in c/common and caught them while vendoring the release into Podman.
good idea. Opened a PR here: containers/podman#10451 |
5ab4b05
to
fcaaf65
Compare
containers/podman#10451 failures don't seem related to the seccomp changes, but are related to the new runc release. Some tests need to be fixed |
@giuseppe Still working on this? |
{ | ||
Names: []string{ | ||
"bdflush", | ||
"clone3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this has to be treated the same way as clone
. Ideally, we would allow this system call because it is required for supporting certain forms of security hardening. But we definitely will need ENOSYS
for clone3
.
"io_pgetevents", | ||
"io_uring_enter", | ||
"io_uring_register", | ||
"io_uring_setup", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
io_uring_*
should be ENOSYS
, too.
"pciconfig_write", | ||
"pkey_alloc", | ||
"pkey_free", | ||
"pkey_mprotect", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pkey_*
functions should really be permitted because they enable security hardening. Otherwise, ENOSYS
, not EPERM
is needed for them.
"pkey_alloc", | ||
"pkey_free", | ||
"pkey_mprotect", | ||
"rseq", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see any harm in permitting rseq
. In the future, it will be an important performance enhancement for some workloads.
"io_uring_setup", | ||
"kexec_file_load", | ||
"kexec_load", | ||
"membarrier", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
membarrier
should default to ENOSYS
if it can't be permitted.
"get_mempolicy", | ||
"mbind", | ||
"set_mempolicy", | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why those are separate from move_pages
and migrate_pages
above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these fails with EPERM only when the container is not granted CAP_SYS_NICE. Otherwise seccomp won't block them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. move_pages
and migrate_pages
seem to have similar constraints. I also believe that these system calls do not require CAP_SYS_NICE
in all cases. See the discussion of the MPOL_MF_MOVE_ALL
flag in the manual pages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you suggest enabling them in any case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, they are similar to madvise
and not really privileged, so I think they can be enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. Added a patch to always allow them
Add support to specify the default errno return value. The OCI runtime specs already have support for it, and both crun (>= 0.19) and runc (>= 1.0-rc95) have support for it. Signed-off-by: Giuseppe Scrivano <[email protected]>
add the currently blocked syscalls to a deny-list and switch the default to ENOSYS. Signed-off-by: Giuseppe Scrivano <[email protected]>
fcaaf65
to
81cd342
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: giuseppe The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@fweimer-rh thanks for the suggestions. I agree with them and I'll implement them as a follow-up PR. I am still struggling to get this in as some Podman CI tests are failing with the new ubuntu/fedora images needed for having runc-1.0-rc95. Progress here: containers/podman#10451 As soon as the tests pass, I will polish the Podman PR to update the images and we can merge this one |
Podman tests are finally green. If we merge the PR, I will cleanup the Podman PR to vendor a release of containers/common and we can then target the new changes to the seccomp policy |
/lgtm |
@fweimer-rh addressed your comments in #627, except for mbind, set_mempolicy, get_mempolicy, but I can add another patch to address it |
seccomp: tweak default profile (followup for #573)
Commit 491daf4 ("iv_fd_epoll: Add support for epoll_pwait2().") added support for epoll_pwait2(), with a fallback to epoll_wait() in case epoll_pwait2() is not supported by the kernel we are running on, which would be indicated by epoll_pwait2() returning -ENOSYS. Some reports (e.g. axoflow/axosyslog#85 , #33 (comment) ) suggest that some container technologies can cause -EPERM to be returned for epoll_pwait2(), independently of whether or not epoll_pwait2() is actually supported by the kernel we are running on, and this trips us up because we don't currently handle -EPERM gracefully, as we did not expect that we would have to do so. Making system calls return -EPERM to indicate that they were filtered out by a security policy framework seems somewhat dubious, especially when considering the amount of application and user confusion generated by system calls that are not documented as being able to fail with -EPERM now suddenly being able to fail with -EPERM, but there is not much we can do about this. I would be against adding EPERM-as-ENOSYS fallbacks for every current or future case where we handle ENOSYS, but: 1. it seems that this is the only case where this triggers; 2. upstream seems to agree that this EPERM behavior is a bug (see e.g. these links dug up by László Várady: containers/common#573 , containers/podman#10337 , opencontainers/runtime-spec#1087 ), so there will hopefully be no new cases of this in the future; 3. there's at least one container technology release (podman on CentOS 7) where this bug triggers and where the platform is sufficiently old to no longer be receiving updates, as pointed out by Balazs Scheidler, so this issue can't be fixed by users updating their container software. Under these circumstances, adding a workaround on our end seems reasonable, and this commit does so. This issue was originally reported by @mstopa-splunk on GitHub. Workaround originally by Balazs Scheidler. Signed-off-by: Lennert Buytenhek <[email protected]>
add the currently blocked syscalls to a deny-list and switch the default to ENOSYS.
Signed-off-by: Giuseppe Scrivano [email protected]